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Abstract 

The KL-optimality criterion is a very general method for selecting the best exper- 
imental designs to discriminate between different statistical models. A KL-optimum 
design is obtained from a minimax optimization problem, and it is defined on a infinite- 
dimensional space. Continuity of the KL-optimality criterion is proved in this paper 
under mild conditions. 
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1 Introduction 

In presence of families of competitive models, the choice of the best experimental condi- 
tions to discriminate between them is one of the main task of the optimum experimental 
designs. In [Tl[2], the authors provided the T-optimality criterion to select between two, 
or more, competitive Gaussian models. This criterion was extended in pT] to dynamic 
heteroscedastic Gaussian models. More recently, the KL-criterion, which is based on the 
Kullback-Leibler divergence, is very general and it may be seen as an extension of the 
previous criterion (see [TJ [8], and [lOj for discrimination between several models). In a 
discrimination problem, a KL-optimum design maximizes the power function in the worst 
case (see [7]). 

In this first version of the paper, we show the continuity of the KL-optimality criterion 
under mild conditions. 

2 Statistical settings of KL-optimality criterion: notations 
and framework 

Let Qi and O2 be the sets of the parameters of two different statistical models. In other 
worlds, two parametric families of conditional density functions fi{y\x; f3i) and f2{y\x; P2) 
are given, where 

• X belongs to the compact experimental domain X in M'', q > 1; 

• &i is an open set of R'^', i = 1, 20 



^We underly that the parameter spaces <di and Q2 are not required to be compact. 
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• fi{y\x;(3j) is the probability density of the response of the experiment, under the 
experimental condition x and the model i, i = 1, 2. 

The KL-optimality criterion is built starting from the Kullback-Leibler divergence between 
the two conditional distributions /i(y|x;/3x) and f2{y\x; (32)'- 

I{x,(3„P2)= [ log^^^f^^^^ hiy\x;f3,)dy. (1) 
Jy f2{y\x;P2) 

The quantity in Equation ([1]) is known to be non-negative, and it is zero if and only if 
the two responses are equal almost surely. The Kullback-Leibler divergence is often called 
distance, although it is not symmetric and it does not satisfy the triangular inequality. In 
this context, the Kullback-Leibler divergence in Equation ([1]) measures the dissimilarity 
between the two different experiments with parameters Pi and /32, when the experimental 
condition is x. We require the continuity of this quantity with respect to the experimental 
condition. 

Assumption 1. The Kullback-Leibler divergence I{x, (3i, (32) given in Equation ([1]) is 
continuous with respect to x. 

A design ^ is a probability distribution having support on X. It is chosen to discrim- 
inate between the two statistical models. When a probability distribution ^ is chosen, 
every experiment will be made with experimental conditions which are distributed as ^. 
In a discrimination problem, the choice of should be done to maximize the "separation" 
of the two statistical models. 

When a design is chosen to maximize the power function in the worst case, it is the 
maximum of the KL-optimality criterion proposed in [7]: 

/2,i(e;^i)= inf / / l^g ^'M^"'^'! h{y\x;Pi)dydC{x). (2) 

For a given value f3i E 0i of the first model, the criterion ([2]) is the minumum Kullback- 
Leibler distance between the joint distribution fi[y\x] (3i)^[x) and the joint statistical 
model f2{y\x\(32)i{x). 

3 Continuity of the KL-optimality criterion 

In this section we study the continuity of the KL-optimality criterion ([2]) with respect 
to the design ^. We start by endowing the set Sx of probability distributions ^ with 
support X <zM3 with a metric d^ which metrizes the weak convergence on X . We take 
the Kantorovich-Wasserstein metric (see [5]): 

d^{ii,i2) = inf{£;(jXi - X2I) : Xi ~ ^1,^2 ~ 6}- 
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Since X is compact, the metric space {Sx,dw), which is an infinite-dimensional space, is 
complete and compact. In fact, any sequence of probability distribution on X is tight, 
and hence, by Prokhorov's Theorem, it admits a converging subsequence (in {Sx,dw)). 

Denote by J {i, Pi-, (32) the average of the function I{x, (3i, (32) in ([1]) with respect to 
the probability measure ^, namely 

JiC,(3i,P2)= I Iix,(3„(32)dax)= [ [ log {^j^|^'^^| hiy\x;P,)dydax). 
Jx JxJy f2{y\x;(32) 

Again, since X is compact, Assumption [T] implies that the function J' is continuous with 
respect to ^. Moreover, this function is linear in ^. A KL-optimal design, if it exists, is a 
distribution ^ for which 

J(|,/3i) = supinf J(e,/3i,/32) 

and therefore, it may be seen as an infinite dimension minmax problem. Our goal is to 
prove that /2,i(^; = iiif/Sj ^(^j Pi^ (^2) is continuous, as an extension of classical results 
for semi-infinite problem (see, e.g., [Q]) to our context. We start with a counterexample, 
which shows that Assumption [T] is not sufficient for l2^i to be continuous, even if X is a 
continuous function of X x 02. 



Example 1 (/2,i(C;/3i) is not continuous). Take X = [0,1], 02 = [0,oo), and define 
I{x,(3i,b) 



2((26-l)x + (l-6)) i/0<6<l 
{b + l)x^ if Kb 



We have: 

• I{x, (3i, P2) ^ continuous function on X x 02/ 

• I{x, (3i, P2) ^•^ ^ convex function of x, for any P2 ^ ©2/ 

• h,iiSx;Pi) = for any x e X. 

Take ^„ be the uniform distribution on [0, 1 — 1/n]; it is easily proved that dwiS,n,0 0, 
where ^ is the uniform distribution on [0, 1] . We have 

[ i{x,p„b)dux)= r'^^^p^dx 

Jx Jo 1 - ^/n 

^fl-^ ifO<b<l 
if Kb 

while ^ 

/ X(x,/3i,6)d^(x) = / I{x,/3^,b)dx = 1. 
Jx Jo 
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Hence, 



/2,i(^n;/3i) = inf / Iix,P^,b)dCn{x) = 

6602 Jx 

7^1= inf / Z(x,/3i,6)dC(x) = /2,i(C;/3i). 

We give here a mild assumption which is satisfied in most situations. In fact, when we 
fix f3i, we can expect that that the Kullback-Leibler divergence I{x, (3i,P2) is "dominated" 
in /32, in that if I{x, ^1,^2) is too big for some x, there is another model that is always 
closer to Pi and dominated by a constant M(/3^). 

Assumption 2. For any fixed Pi, there exists M = M{(3i) > such that ifI{x,Pi,(32) > 
M for some x ^ X , then there will exists P2 such that 

I{x,(3i,(32)>Iix,(3i,P2), yxeX, 



am 



supX(x,/3i,/32) <M. 

xex 

Theorem 1. Assume{l\ andl^ The KL-criterion ^ is [a locally Lip schitz function and 
hence] a continuous function of ^. 



Proof. Let be fixed and M be as in Assumption [2l Define 

= {^2 G ©2: supZ(x,/3i,32) < M}. 



x&X 



The KL-criterion for any ^ € Sx, may be rewritten as 

hMPi)= inf / X(x,/3i,/32)d^(x), 

/32G0f 1 Jx 



(3) 



(4) 



for Assumption [2j 

Let AAx be the real vector space of all signed finite measures on X (equipped the 
usual Borel cr-algebra B), which contains Sx as proper, closed, convex subset. In [12j it is 
proved that M.'^ (the restriction of A^;^ to positive measures) is metrisable by a complete 
metric, which may be chosen (see [5) as 



iieii 



M? 



sup ■ 



{I / h{x)di{x] 



\hh, < 1 



The vector space Mx^ equipped with the norm above, is hence a Banach space itself. 

The map ^ 1— )• J-^I{x, f3i, P2) d£,ix), is a linear functional on A4x- Since X is compact, 
Assumption [1] guarantees its boundness on the unit ball Ij^CHA^^t- < 1: 



Mx,(3i,P2)d^ix] 



IX 



< I sup |X(x,/3i,/32)| (i|C|(x) = sup |X(x,/3i,/32)|; 
'X xex xex 
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and hence ^ i-> f-^I{x,(3i,(32) dS,{x), is a continuous functional. The function 



inf I X{x,f3^,f32)di{x) 



is concave and upper semi-continuous function since it is the point-wise inferior of hnear 
continuous functions. Moreover, — oo < inf^ ^^^^ f-^Z{x, Pi, (32) d^{x) < 00 since, for any 



X(x,/3i,/32)cZC(x) 



< Ml 



as a consequence of ([3]). Therefore (see |3j) the function 



inf / X(x,/3i,/32)d^(x) 



is locally Liptschtz and hence continuous on the Banach space (A^a"!! • IIai^^)- 

Recall that the set Sx is the set of the possible experimental designs ^. When ^1,^2 £ 
Sx, the Wasserstein (or Kantorovich) distance can be rewritten also as (see [6]) 



dUCi,C2) = sup { I ^ h{x) dCi{x) - h{x) dUx)[ WHl < 1} = 116 - e 
By Q, since dyj metrizes the weak convergence, the KL-criterion ([2]) 

^2,i(e;/3i)= inf / X(x,/3i,/32)de(x) = inf / X{x, (3^, P^) di{x) 



l2\\Mx- 



is a locally Liptschtz and continuous function on Sx- 



□ 



Remark 1. It is straightforward to extend the results proved to the case when the two 
statistical models fi{y\x; f3i) and f2{y\x; (32) are conditional densities with respect to a 
general measure A. For instance, discrete models are included. 
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